A Reexamination of Lord’s Wald Test for Differential Item Functioning Using Item Response Theory and Modern Error Estimation

نویسندگان

  • Michelle M. Langer
  • David Thissen
چکیده

The detection of differential item functioning (DIF) is an essential step in increasing the validity of a test for all groups. The item response theory (IRT) model comparison approach has been shown to be the most flexible and powerful method for DIF detection; however, it is computationally-intensive, requiring many model-refittings. The Wald test, originally employed by Lord for DIF detection, is asymptotically equivalent to this approach and requires only one model fitting. In this research, the Wald test for DIF detection was improved from Lord's original conception through modern error estimation, concurrent calibration, maximum marginal likelihood item parameter estimation, conditional DIF tests, and extensions to commonly used IRT models as well as multiple groups. This research examined the Type I error and power of the Wald test by varying the magnitude of DIF, the mean difference between groups, test length, and the sample size per group. Data were simulated under the graded response model and the three-parameter logistic (3PL) model. An additional simulation study compared the IRT model comparison approach to the Wald test under the two-parameter logistic model. The results indicated that the Wald test performs well detecting DIF. The performance improves with larger sample sizes, greater magnitudes of DIF, greater test lengths, and the random assignment estimation procedure. The use of larger sample sizes and greater test lengths is most critical for situations employing the 3PL model. The Wald test also performs well compared to the IRT model iii comparison approach, although the results of the two methods should converge asymptotically. This research also demonstrated the flexibility of the Wald test through its straightforward extension to multiple groups. An example was used to demonstrate the effectiveness of the Wald test and compare it to the IRT model comparison approach. The Wald test was able to accurately identify the source of DIF. However, the IRT model comparison approach appeared more powerful but confounded the results of the DIF tests, due to combining groups. Several considerations for designing a DIF detection framework given multiple groups were outlined, particularly the superiority of the Wald test when given unequal sample sizes. iv ACKNOWLEDGMENTS

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Selecting the Best Fit Model in Cognitive Diagnostic Assessment: Differential Item Functioning Detection in the Reading Comprehension of the PhD Nationwide Admission Test

This study was an attemptto provide detailed information of the strengths and weaknesses of test takers‟ real ability through cognitive diagnostic assessment, and to detect differential item functioning in each test item. The rationale for using CDA was that it estimates an item‟s discrimination power, whereas clas- sical test theory or item response theory depicts between rather within item mu...

متن کامل

Differential Item Functioning (DIF) in Terms of Gender in the Reading Comprehension Subtest of a High-Stakes Test

Validation is an important enterprise especially when a test is a high stakes one. Demographic variables like gender and field of study can affect test results and interpretations. Differential Item Functioning (DIF) is a way to make sure that a test does not favor one group of test takers over the others. This study investigated DIF in terms of gender in the reading comprehension subtest (35 i...

متن کامل

A confirmatory study of Differential Item Functioning on EFL reading comprehension

The  present  study  aimed  at  investigating  DIF  sources  on  an  EFL  reading  comprehension test.  Accordingly,  2  DIF  detection  methods,  logistic  regression  (LR)  and  item  response theory  (IRT),  were  used  to  flag  emergent  DIF  of  203  (110  females  &  93  males)  Iranian EFL examinees’ performance on a reading comprehension test. Seven hypothetical DIF sources were examin...

متن کامل

Longitudinal Differential Item Functioning Detection Using Bifactor Models and the Wald Test BY

The use of longitudinal data for studying cross-time changes is built on the key assumption that properties (e.g., slopes and intercepts) of the repeatedly-used items remain unchanged over time. True changes in the latent variables are indistinguishable from item-level changes when items exhibit differential item functioning (DIF) across time points. To date, no research has extended the modifi...

متن کامل

Selection the best Method of Equating Using Anchor-Test Design‎ in Item Response Theory ‎‎

Explaining the problem. The equating process is used to compare the scores of the two different tests with the same theme‎. ‎The goal of this research is finding the best method of equating data using Logistic model. ‎ Method. we are using the data of Ph.D‎. ‎test in Statistic major for two consecutive years 92 and 93‎. ‎For analyzing‎, ‎we are specifically using the tests of Statistics major ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008